Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 26
Filter
1.
Sci Rep ; 14(1): 8708, 2024 04 15.
Article in English | MEDLINE | ID: mdl-38622173

ABSTRACT

Recent work has revealed an important role for rare, incompletely penetrant inherited coding variants in neurodevelopmental disorders (NDDs). Additionally, we have previously shown that common variants contribute to risk for rare NDDs. Here, we investigate whether common variants exert their effects by modifying gene expression, using multi-cis-expression quantitative trait loci (cis-eQTL) prediction models. We first performed a transcriptome-wide association study for NDDs using 6987 probands from the Deciphering Developmental Disorders (DDD) study and 9720 controls, and found one gene, RAB2A, that passed multiple testing correction (p = 6.7 × 10-7). We then investigated whether cis-eQTLs modify the penetrance of putatively damaging, rare coding variants inherited by NDD probands from their unaffected parents in a set of 1700 trios. We found no evidence that unaffected parents transmitting putatively damaging coding variants had higher genetically-predicted expression of the variant-harboring gene than their child. In probands carrying putatively damaging variants in constrained genes, the genetically-predicted expression of these genes in blood was lower than in controls (p = 2.7 × 10-3). However, results for proband-control comparisons were inconsistent across different sets of genes, variant filters and tissues. We find limited evidence that common cis-eQTLs modify penetrance of rare coding variants in a large cohort of NDD probands.


Subject(s)
Neurodevelopmental Disorders , Polymorphism, Single Nucleotide , Child , Humans , Penetrance , Quantitative Trait Loci/genetics , Neurodevelopmental Disorders/genetics , Transcriptome
2.
J Clin Endocrinol Metab ; 108(12): e1580-e1587, 2023 Nov 17.
Article in English | MEDLINE | ID: mdl-37339320

ABSTRACT

CONTEXT: The melanocortin 3 receptor (MC3R) has recently emerged as a critical regulator of pubertal timing, linear growth, and the acquisition of lean mass in humans and mice. In population-based studies, heterozygous carriers of deleterious variants in MC3R report a later onset of puberty than noncarriers. However, the frequency of such variants in patients who present with clinical disorders of pubertal development is currently unknown. OBJECTIVE: This work aimed to determine whether deleterious MC3R variants are more frequently found in patients clinically presenting with constitutional delay of growth and puberty (CDGP) or normosmic idiopathic hypogonadotropic hypogonadism (nIHH). METHODS: We examined the sequence of MC3R in 362 adolescents with a clinical diagnosis of CDGP and 657 patients with nIHH, experimentally characterized the signaling properties of all nonsynonymous variants found and compared their frequency to that in 5774 controls from a population-based cohort. Additionally, we established the relative frequency of predicted deleterious variants in individuals with self-reported delayed vs normally timed menarche/voice-breaking in the UK Biobank cohort. RESULTS: MC3R loss-of-function variants were infrequent but overrepresented in patients with CDGP (8/362 [2.2%]; OR = 4.17; P = .001). There was no strong evidence of overrepresentation in patients with nIHH (4/657 [0.6%]; OR = 1.15; P = .779). In 246 328 women from the UK Biobank, predicted deleterious variants were more frequently found in those self-reporting delayed (aged ≥16 years) vs normal age at menarche (OR = 1.66; P = 3.90E-07). CONCLUSION: We have found evidence that functionally damaging variants in MC3R are overrepresented in individuals with CDGP but are not a common cause of this phenotype.


Subject(s)
Hypogonadism , Puberty, Delayed , Adolescent , Humans , Female , Animals , Mice , Receptor, Melanocortin, Type 3 , Prevalence , Hypogonadism/epidemiology , Hypogonadism/genetics , Hypogonadism/complications , Puberty, Delayed/epidemiology , Puberty, Delayed/genetics , Puberty, Delayed/diagnosis , Puberty/genetics , Growth Disorders/genetics
3.
N Engl J Med ; 388(17): 1559-1571, 2023 Apr 27.
Article in English | MEDLINE | ID: mdl-37043637

ABSTRACT

BACKGROUND: Pediatric disorders include a range of highly penetrant, genetically heterogeneous conditions amenable to genomewide diagnostic approaches. Finding a molecular diagnosis is challenging but can have profound lifelong benefits. METHODS: We conducted a large-scale sequencing study involving more than 13,500 families with probands with severe, probably monogenic, difficult-to-diagnose developmental disorders from 24 regional genetics services in the United Kingdom and Ireland. Standardized phenotypic data were collected, and exome sequencing and microarray analyses were performed to investigate novel genetic causes. We developed an iterative variant analysis pipeline and reported candidate variants to clinical teams for validation and diagnostic interpretation to inform communication with families. Multiple regression analyses were performed to evaluate factors affecting the probability of diagnosis. RESULTS: A total of 13,449 probands were included in the analyses. On average, we reported 1.0 candidate variant per parent-offspring trio and 2.5 variants per singleton proband. Using clinical and computational approaches to variant classification, we made a diagnosis in approximately 41% of probands (5502 of 13,449). Of 3599 probands in trios who received a diagnosis by clinical assertion, approximately 76% had a pathogenic de novo variant. Another 22% of probands (2997 of 13,449) had variants of uncertain significance in genes that were strongly linked to monogenic developmental disorders. Recruitment in a parent-offspring trio had the largest effect on the probability of diagnosis (odds ratio, 4.70; 95% confidence interval [CI], 4.16 to 5.31). Probands were less likely to receive a diagnosis if they were born extremely prematurely (i.e., 22 to 27 weeks' gestation; odds ratio, 0.39; 95% CI, 0.22 to 0.68), had in utero exposure to antiepileptic medications (odds ratio, 0.44; 95% CI, 0.29 to 0.67), had mothers with diabetes (odds ratio, 0.52; 95% CI, 0.41 to 0.67), or were of African ancestry (odds ratio, 0.51; 95% CI, 0.31 to 0.78). CONCLUSIONS: Among probands with severe, probably monogenic, difficult-to-diagnose developmental disorders, multimodal analysis of genomewide data had good diagnostic power, even after previous attempts at diagnosis. (Funded by the Health Innovation Challenge Fund and Wellcome Sanger Institute.).


Subject(s)
Genomics , Rare Diseases , Child , Humans , Exome , Ireland/epidemiology , United Kingdom/epidemiology , Rare Diseases/diagnosis , Rare Diseases/epidemiology , Rare Diseases/genetics , Oligonucleotide Array Sequence Analysis , Genetic Association Studies , Neurodevelopmental Disorders/diagnosis , Neurodevelopmental Disorders/genetics , Congenital Abnormalities/diagnosis , Congenital Abnormalities/genetics , Growth Disorders/diagnosis , Growth Disorders/genetics , Facies , Child Behavior Disorders/diagnosis , Child Behavior Disorders/genetics , Genetic Diseases, Inborn/diagnosis , Genetic Diseases, Inborn/genetics
4.
Am J Hum Genet ; 108(11): 2186-2194, 2021 11 04.
Article in English | MEDLINE | ID: mdl-34626536

ABSTRACT

Structural variation (SV) describes a broad class of genetic variation greater than 50 bp in size. SVs can cause a wide range of genetic diseases and are prevalent in rare developmental disorders (DDs). Individuals presenting with DDs are often referred for diagnostic testing with chromosomal microarrays (CMAs) to identify large copy-number variants (CNVs) and/or with single-gene, gene-panel, or exome sequencing (ES) to identify single-nucleotide variants, small insertions/deletions, and CNVs. However, individuals with pathogenic SVs undetectable by conventional analysis often remain undiagnosed. Consequently, we have developed the tool InDelible, which interrogates short-read sequencing data for split-read clusters characteristic of SV breakpoints. We applied InDelible to 13,438 probands with severe DDs recruited as part of the Deciphering Developmental Disorders (DDD) study and discovered 63 rare, damaging variants in genes previously associated with DDs missed by standard SNV, indel, or CNV discovery approaches. Clinical review of these 63 variants determined that about half (30/63) were plausibly pathogenic. InDelible was particularly effective at ascertaining variants between 21 and 500 bp in size and increased the total number of potentially pathogenic variants identified by DDD in this size range by 42.9%. Of particular interest were seven confirmed de novo variants in MECP2, which represent 35.0% of all de novo protein-truncating variants in MECP2 among DDD study participants. InDelible provides a framework for the discovery of pathogenic SVs that are most likely missed by standard analytical workflows and has the potential to improve the diagnostic yield of ES across a broad range of genetic diseases.


Subject(s)
Developmental Disabilities/diagnosis , Developmental Disabilities/genetics , Exome Sequencing/methods , Child , Female , Humans , Male , Methyl-CpG-Binding Protein 2/genetics
5.
Am J Hum Genet ; 108(6): 1083-1094, 2021 06 03.
Article in English | MEDLINE | ID: mdl-34022131

ABSTRACT

Clinical genetic testing of protein-coding regions identifies a likely causative variant in only around half of developmental disorder (DD) cases. The contribution of regulatory variation in non-coding regions to rare disease, including DD, remains very poorly understood. We screened 9,858 probands from the Deciphering Developmental Disorders (DDD) study for de novo mutations in the 5' untranslated regions (5' UTRs) of genes within which variants have previously been shown to cause DD through a dominant haploinsufficient mechanism. We identified four single-nucleotide variants and two copy-number variants upstream of MEF2C in a total of ten individual probands. We developed multiple bespoke and orthogonal experimental approaches to demonstrate that these variants cause DD through three distinct loss-of-function mechanisms, disrupting transcription, translation, and/or protein function. These non-coding region variants represent 23% of likely diagnoses identified in MEF2C in the DDD cohort, but these would all be missed in standard clinical genetics approaches. Nonetheless, these variants are readily detectable in exome sequence data, with 30.7% of 5' UTR bases across all genes well covered in the DDD dataset. Our analyses show that non-coding variants upstream of genes within which coding variants are known to cause DD are an important cause of severe disease and demonstrate that analyzing 5' UTRs can increase diagnostic yield. We also show how non-coding variants can help inform both the disease-causing mechanism underlying protein-coding variants and dosage tolerance of the gene.


Subject(s)
5' Untranslated Regions , Developmental Disabilities/etiology , Genetic Predisposition to Disease , Loss of Function Mutation , Child , Cohort Studies , DNA Copy Number Variations , Developmental Disabilities/pathology , Humans , MEF2 Transcription Factors/genetics , Exome Sequencing
6.
Nat Commun ; 12(1): 627, 2021 01 27.
Article in English | MEDLINE | ID: mdl-33504798

ABSTRACT

Over 130 X-linked genes have been robustly associated with developmental disorders, and X-linked causes have been hypothesised to underlie the higher developmental disorder rates in males. Here, we evaluate the burden of X-linked coding variation in 11,044 developmental disorder patients, and find a similar rate of X-linked causes in males and females (6.0% and 6.9%, respectively), indicating that such variants do not account for the 1.4-fold male bias. We develop an improved strategy to detect X-linked developmental disorders and identify 23 significant genes, all of which were previously known, consistent with our inference that the vast majority of the X-linked burden is in known developmental disorder-associated genes. Importantly, we estimate that, in male probands, only 13% of inherited rare missense variants in known developmental disorder-associated genes are likely to be pathogenic. Our results demonstrate that statistical analysis of large datasets can refine our understanding of modes of inheritance for individual X-linked disorders.


Subject(s)
Developmental Disabilities/genetics , Genes, X-Linked , Genetic Diseases, X-Linked/genetics , Genetic Variation , Chromosomes, Human, X/genetics , Female , Genes, Recessive , Humans , Inheritance Patterns/genetics , Male , Multifactorial Inheritance/genetics , Mutation/genetics , Phenotype , Sex Characteristics
7.
Genet Med ; 23(3): 571-575, 2021 03.
Article in English | MEDLINE | ID: mdl-33149276

ABSTRACT

PURPOSE: Automated variant filtering is an essential part of diagnostic genome-wide sequencing but may generate false negative results. We sought to investigate whether some previously identified pathogenic variants may be being routinely excluded by standard variant filtering pipelines. METHODS: We evaluated variants that were previously classified as pathogenic or likely pathogenic in ClinVar in known developmental disorder genes using exome sequence data from the Deciphering Developmental Disorders (DDD) study. RESULTS: Of these ClinVar pathogenic variants, 3.6% were identified among 13,462 DDD probands, and 1134/1352 (83.9%) had already been independently communicated to clinicians using DDD variant filtering pipelines as plausibly pathogenic. The remaining 218 variants failed consequence, inheritance, or other automated variant filters. Following clinical review of these additional variants, we were able to identify 112 variants in 107 (0.8%) DDD probands as potential diagnoses. CONCLUSION: Lower minor allele frequency (<0.0005%) and higher gold star review status in ClinVar (>1 star) are good predictors of a previously identified variant being plausibly diagnostic for developmental disorders. However, around half of previously identified pathogenic variants excluded by automated variant filtering did not appear to be disease-causing, underlining the continued need for clinical evaluation of candidate variants as part of the diagnostic process.


Subject(s)
Databases, Genetic , Exome , Gene Frequency , Humans , Exome Sequencing
8.
Nature ; 586(7831): 757-762, 2020 10.
Article in English | MEDLINE | ID: mdl-33057194

ABSTRACT

De novo mutations in protein-coding genes are a well-established cause of developmental disorders1. However, genes known to be associated with developmental disorders account for only a minority of the observed excess of such de novo mutations1,2. Here, to identify previously undescribed genes associated with developmental disorders, we integrate healthcare and research exome-sequence data from 31,058 parent-offspring trios of individuals with developmental disorders, and develop a simulation-based statistical test to identify gene-specific enrichment of de novo mutations. We identified 285 genes that were significantly associated with developmental disorders, including 28 that had not previously been robustly associated with developmental disorders. Although we detected more genes associated with developmental disorders, much of the excess of de novo mutations in protein-coding genes remains unaccounted for. Modelling suggests that more than 1,000 genes associated with developmental disorders have not yet been described, many of which are likely to be less penetrant than the currently known genes. Research access to clinical diagnostic datasets will be critical for completing the map of genes associated with developmental disorders.


Subject(s)
DNA Mutational Analysis , Data Analysis , Databases, Genetic , Datasets as Topic , Delivery of Health Care/statistics & numerical data , Developmental Disabilities/genetics , Genetic Diseases, Inborn/genetics , Cohort Studies , DNA Copy Number Variations/genetics , Developmental Disabilities/diagnosis , Europe , Female , Genetic Diseases, Inborn/diagnosis , Germ-Line Mutation/genetics , Haploinsufficiency/genetics , Humans , Male , Mutation, Missense/genetics , Penetrance , Perinatal Death , Sample Size
9.
Lancet ; 393(10173): 747-757, 2019 02 23.
Article in English | MEDLINE | ID: mdl-30712880

ABSTRACT

BACKGROUND: Fetal structural anomalies, which are detected by ultrasonography, have a range of genetic causes, including chromosomal aneuploidy, copy number variations (CNVs; which are detectable by chromosomal microarrays), and pathogenic sequence variants in developmental genes. Testing for aneuploidy and CNVs is routine during the investigation of fetal structural anomalies, but there is little information on the clinical usefulness of genome-wide next-generation sequencing in the prenatal setting. We therefore aimed to evaluate the proportion of fetuses with structural abnormalities that had identifiable variants in genes associated with developmental disorders when assessed with whole-exome sequencing (WES). METHODS: In this prospective cohort study, two groups in Birmingham and London recruited patients from 34 fetal medicine units in England and Scotland. We used whole-exome sequencing (WES) to evaluate the presence of genetic variants in developmental disorder genes (diagnostic genetic variants) in a cohort of fetuses with structural anomalies and samples from their parents, after exclusion of aneuploidy and large CNVs. Women were eligible for inclusion if they were undergoing invasive testing for identified nuchal translucency or structural anomalies in their fetus, as detected by ultrasound after 11 weeks of gestation. The partners of these women also had to consent to participate. Sequencing results were interpreted with a targeted virtual gene panel for developmental disorders that comprised 1628 genes. Genetic results related to fetal structural anomaly phenotypes were then validated and reported postnatally. The primary endpoint, which was assessed in all fetuses, was the detection of diagnostic genetic variants considered to have caused the fetal developmental anomaly. FINDINGS: The cohort was recruited between Oct 22, 2014, and June 29, 2017, and clinical data were collected until March 31, 2018. After exclusion of fetuses with aneuploidy and CNVs, 610 fetuses with structural anomalies and 1202 matched parental samples (analysed as 596 fetus-parental trios, including two sets of twins, and 14 fetus-parent dyads) were analysed by WES. After bioinformatic filtering and prioritisation according to allele frequency and effect on protein and inheritance pattern, 321 genetic variants (representing 255 potential diagnoses) were selected as potentially pathogenic genetic variants (diagnostic genetic variants), and these variants were reviewed by a multidisciplinary clinical review panel. A diagnostic genetic variant was identified in 52 (8·5%; 95% CI 6·4-11·0) of 610 fetuses assessed and an additional 24 (3·9%) fetuses had a variant of uncertain significance that had potential clinical usefulness. Detection of diagnostic genetic variants enabled us to distinguish between syndromic and non-syndromic fetal anomalies (eg, congenital heart disease only vs a syndrome with congenital heart disease and learning disability). Diagnostic genetic variants were present in 22 (15·4%) of 143 fetuses with multisystem anomalies (ie, more than one fetal structural anomaly), nine (11·1%) of 81 fetuses with cardiac anomalies, and ten (15·4%) of 65 fetuses with skeletal anomalies; these phenotypes were most commonly associated with diagnostic variants. However, diagnostic genetic variants were least common in fetuses with isolated increased nuchal translucency (≥4·0 mm) in the first trimester (in three [3·2%] of 93 fetuses). INTERPRETATION: WES facilitates genetic diagnosis of fetal structural anomalies, which enables more accurate predictions of fetal prognosis and risk of recurrence in future pregnancies. However, the overall detection of diagnostic genetic variants in a prospectively ascertained cohort with a broad range of fetal structural anomalies is lower than that suggested by previous smaller-scale studies of fewer phenotypes. WES improved the identification of genetic disorders in fetuses with structural abnormalities; however, before clinical implementation, careful consideration should be given to case selection to maximise clinical usefulness. FUNDING: UK Department of Health and Social Care and The Wellcome Trust.


Subject(s)
Abnormal Karyotype/statistics & numerical data , Congenital Abnormalities/genetics , Exome Sequencing/statistics & numerical data , Fetal Development/genetics , Fetus/abnormalities , Abnormal Karyotype/embryology , Abortion, Eugenic/statistics & numerical data , Abortion, Spontaneous/epidemiology , Congenital Abnormalities/diagnosis , Congenital Abnormalities/epidemiology , DNA Copy Number Variations/genetics , Female , Fetus/diagnostic imaging , Humans , Infant, Newborn , Live Birth/epidemiology , Male , Nuchal Translucency Measurement , Parents , Perinatal Death/etiology , Pregnancy , Prospective Studies , Stillbirth/epidemiology , Exome Sequencing/methods
10.
Genet Med ; 21(5): 1065-1073, 2019 05.
Article in English | MEDLINE | ID: mdl-30293990

ABSTRACT

PURPOSE: To determine the diagnostic yield of combined exome sequencing (ES) and autopsy in fetuses/neonates with prenatally identified structural anomalies resulting in termination of pregnancy, intrauterine, neonatal, or early infant death. METHODS: ES was undertaken in 27 proband/parent trios following full autopsy. Candidate pathogenic variants were classified by a multidisciplinary clinical review panel using American College of Medical Genetics and Genomics (ACMG) guidelines. RESULTS: A genetic diagnosis was established in ten cases (37%). Pathogenic/likely pathogenic variants were identified in nine different genes including four de novo autosomal dominant, three homozygous autosomal recessive, two compound heterozygous autosomal recessive, and one X-linked. KMT2D variants (associated with Kabuki syndrome postnatally) occurred in two cases. Pathogenic variants were identified in 5/13 (38%) cases with multisystem anomalies, in 2/4 (50%) cases with fetal akinesia deformation sequence, and in 1/4 (25%) cases each with cardiac and brain anomalies and hydrops fetalis. No pathogenic variants were detected in fetuses with genitourinary (1), skeletal (1), or abdominal (1) abnormalities. CONCLUSION: This cohort demonstrates the clinical utility of molecular autopsy with ES to identify an underlying genetic cause in structurally abnormal fetuses/neonates. These molecular findings provided parents with an explanation of the developmental abnormality, delineated the recurrence risks, and assisted the management of subsequent pregnancies.


Subject(s)
Congenital Abnormalities/genetics , Fetal Diseases/genetics , Prenatal Diagnosis/methods , Autopsy/methods , Cohort Studies , Congenital Abnormalities/diagnosis , Exome/genetics , Female , Fetal Diseases/diagnosis , Fetus/diagnostic imaging , Humans , Infant, Newborn , Male , Pregnancy , Exome Sequencing/methods
11.
Nucleic Acids Res ; 44(D1): D279-85, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26673716

ABSTRACT

In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.


Subject(s)
Databases, Protein , Proteins/classification , Proteome/chemistry , Sequence Alignment , Sequence Analysis, Protein , Molecular Sequence Annotation
12.
Methods Mol Biol ; 1269: 349-63, 2015.
Article in English | MEDLINE | ID: mdl-25577390

ABSTRACT

The primary task of the Rfam database is to collate experimentally validated noncoding RNA (ncRNA) sequences from the published literature and facilitate the prediction and annotation of new homologues in novel nucleotide sequences. We group homologous ncRNA sequences into "families" and related families are further grouped into "clans." We collate and manually curate data cross-references for these families from other databases and external resources. Our Web site offers researchers a simple interface to Rfam and provides tools with which to annotate their own sequences using our covariance models (CMs), through our tools for searching, browsing, and downloading information on Rfam families. In this chapter, we will work through examples of annotating a query sequence, collating family information, and searching for data.


Subject(s)
Computational Biology/methods , RNA, Untranslated/chemistry , Databases, Nucleic Acid , Sequence Analysis, RNA , Software
13.
Nucleic Acids Res ; 43(Database issue): D130-7, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25392425

ABSTRACT

The Rfam database (available at http://rfam.xfam.org) is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structures and annotation gathered from corresponding Wikipedia, taxonomy and ontology resources. In this article, we detail updates and improvements to the Rfam data and website for the Rfam 12.0 release. We describe the upgrade of our search pipeline to use Infernal 1.1 and demonstrate its improved homology detection ability by comparison with the previous version. The new pipeline is easier for users to apply to their own data sets, and we illustrate its ability to annotate RNAs in genomic and metagenomic data sets of various sizes. Rfam has been expanded to include 260 new families, including the well-studied large subunit ribosomal RNA family, and for the first time includes information on short sequence- and structure-based RNA motifs present within families.


Subject(s)
Databases, Nucleic Acid , RNA, Untranslated/chemistry , Genomics , Internet , Molecular Sequence Annotation , Nucleic Acid Conformation , Nucleotide Motifs , RNA, Long Noncoding/chemistry , RNA, Untranslated/classification , Software
14.
BMC Bioinformatics ; 15: 196, 2014 Jun 17.
Article in English | MEDLINE | ID: mdl-24938123

ABSTRACT

BACKGROUND: Gut microbiome metagenomics has revealed many protein families and domains found largely or exclusively in that environment. Proteins containing the GxGYxYP domain are over-represented in the gut microbiota, and are found in Polysaccharide Utilization Loci in the gut symbiont Bacteroides thetaiotaomicron, suggesting their involvement in polysaccharide metabolism, but little else is known of the function of this domain. RESULTS: Genomic context and domain architecture analyses support a role for the GxGYxYP domain in carbohydrate metabolism. Sparse occurrences in eukaryotes are the result of lateral gene transfer. The structure of the GxGYxYP domain-containing protein encoded by the BT2193 locus reveals two structural domains, the first composed of three divergent repeats with no recognisable homology to previously solved structures, the second a more familiar seven-stranded ß/α barrel. Structure-based analyses including conservation mapping localise a presumed functional site to a cleft between the two domains of BT2193. Matching to a catalytic site template from a GH9 cellulase and other analyses point to a putative catalytic triad composed of Glu272, Asp331 and Asp333. CONCLUSIONS: We suggest that GxGYxYP-containing proteins constitute a novel glycoside hydrolase family of as yet unknown specificity.


Subject(s)
Glycoside Hydrolases/chemistry , Bacteroides/chemistry , Bacteroides/enzymology , Biocatalysis , Glycoside Hydrolases/genetics , Glycoside Hydrolases/metabolism , Models, Molecular , Phylogeny , Protein Structure, Tertiary , Structural Homology, Protein
15.
BMC Bioinformatics ; 15: 112, 2014 Apr 17.
Article in English | MEDLINE | ID: mdl-24742328

ABSTRACT

BACKGROUND: Bacteroides spp. form a significant part of our gut microbiome and are well known for optimized metabolism of diverse polysaccharides. Initial analysis of the archetypal Bacteroides thetaiotaomicron genome identified 172 glycosyl hydrolases and a large number of uncharacterized proteins associated with polysaccharide metabolism. RESULTS: BT_1012 from Bacteroides thetaiotaomicron VPI-5482 is a protein of unknown function and a member of a large protein family consisting entirely of uncharacterized proteins. Initial sequence analysis predicted that this protein has two domains, one on the N- and one on the C-terminal. A PSI-BLAST search found over 150 full length and over 90 half size homologs consisting only of the N-terminal domain. The experimentally determined three-dimensional structure of the BT_1012 protein confirms its two-domain architecture and structural analysis of both domains suggests their specific functions. The N-terminal domain is a putative catalytic domain with significant similarity to known glycoside hydrolases, the C-terminal domain has a beta-sandwich fold typically found in C-terminal domains of other glycosyl hydrolases, however these domains are typically involved in substrate binding. We describe the structure of the BT_1012 protein and discuss its sequence-structure relationship and their possible functional implications. CONCLUSIONS: Structural and sequence analyses of the BT_1012 protein identifies it as a glycosyl hydrolase, expanding an already impressive catalog of enzymes involved in polysaccharide metabolism in Bacteroides spp. Based on this we have renamed the Pfam families representing the two domains found in the BT_1012 protein, PF13204 and PF12904, as putative glycoside hydrolase and glycoside hydrolase-associated C-terminal domain respectively.


Subject(s)
Bacterial Proteins/chemistry , Glycoside Hydrolases/chemistry , Amino Acid Sequence , Bacterial Proteins/genetics , Bacteroides/enzymology , Computational Biology , Gastrointestinal Tract/microbiology , Genomics , Glycoside Hydrolases/genetics , Humans , Protein Structure, Tertiary
16.
BMC Bioinformatics ; 15: 1, 2014 Jan 03.
Article in English | MEDLINE | ID: mdl-24383880

ABSTRACT

BACKGROUND: The Acel_2062 protein from Acidothermus cellulolyticus is a protein of unknown function. Initial sequence analysis predicted that it was a metallopeptidase from the presence of a motif conserved amongst the Asp-zincins, which are peptidases that contain a single, catalytic zinc ion ligated by the histidines and aspartic acid within the motif (HEXXHXXGXXD). The Acel_2062 protein was chosen by the Joint Center for Structural Genomics for crystal structure determination to explore novel protein sequence space and structure-based function annotation. RESULTS: The crystal structure confirmed that the Acel_2062 protein consisted of a single, zincin-like metallopeptidase-like domain. The Met-turn, a structural feature thought to be important for a Met-zincin because it stabilizes the active site, is absent, and its stabilizing role may have been conferred to the C-terminal Tyr113. In our crystallographic model there are two molecules in the asymmetric unit and from size-exclusion chromatography, the protein dimerizes in solution. A water molecule is present in the putative zinc-binding site in one monomer, which is replaced by one of two observed conformations of His95 in the other. CONCLUSIONS: The Acel_2062 protein is structurally related to the zincins. It contains the minimum structural features of a member of this protein superfamily, and can be described as a "mini- zincin". There is a striking parallel with the structure of a mini-Glu-zincin, which represents the minimum structure of a Glu-zincin (a metallopeptidase in which the third zinc ligand is a glutamic acid). Rather than being an ancestral state, phylogenetic analysis suggests that the mini-zincins are derived from larger proteins.


Subject(s)
Bacterial Proteins/chemistry , Metalloproteases/chemistry , Zinc/chemistry , Actinomycetales/chemistry , Actinomycetales/enzymology , Amino Acid Motifs , Amino Acid Sequence , Bacterial Proteins/metabolism , Dimerization , Metalloproteases/metabolism , Models, Molecular , Molecular Sequence Data , Phylogeny , Protein Subunits , Sequence Alignment , Zinc/metabolism
17.
Nucleic Acids Res ; 42(Database issue): D222-30, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24288371

ABSTRACT

Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.


Subject(s)
Databases, Protein , Sequence Alignment , Sequence Analysis, Protein , Internet , Intrinsically Disordered Proteins/chemistry , Protein Conformation , Proteins/chemistry , Proteins/classification , Proteins/genetics , Proteome/chemistry , Sequence Analysis, DNA
18.
BMC Bioinformatics ; 14: 341, 2013 Nov 26.
Article in English | MEDLINE | ID: mdl-24274019

ABSTRACT

BACKGROUND: A novel highly conserved protein domain, DUF162 [Pfam: PF02589], can be mapped to two proteins: LutB and LutC. Both proteins are encoded by a highly conserved LutABC operon, which has been implicated in lactate utilization in bacteria. Based on our analysis of its sequence, structure, and recent experimental evidence reported by other groups, we hereby redefine DUF162 as the LUD domain family. RESULTS: JCSG solved the first crystal structure [PDB:2G40] from the LUD domain family: LutC protein, encoded by ORF DR_1909, of Deinococcus radiodurans. LutC shares features with domains in the functionally diverse ISOCOT superfamily. We have observed that the LUD domain has an increased abundance in the human gut microbiome. CONCLUSIONS: We propose a model for the substrate and cofactor binding and regulation in LUD domain. The significance of LUD-containing proteins in the human gut microbiome, and the implication of lactate metabolism in the radiation-resistance of Deinococcus radiodurans are discussed.


Subject(s)
Bacterial Proteins/metabolism , Deinococcus/chemistry , Deinococcus/metabolism , Lactic Acid/metabolism , Amino Acid Sequence , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Crystallography, X-Ray , Deinococcus/genetics , Humans , Microbiota/radiation effects , Molecular Sequence Data , Protein Structure, Tertiary
19.
BMC Bioinformatics ; 14: 327, 2013 Nov 19.
Article in English | MEDLINE | ID: mdl-24246060

ABSTRACT

BACKGROUND: The NTF2-like superfamily is a versatile group of protein domains sharing a common fold. The sequences of these domains are very diverse and they share no common sequence motif. These domains serve a range of different functions within the proteins in which they are found, including both catalytic and non-catalytic versions. Clues to the function of protein domains belonging to such a diverse superfamily can be gleaned from analysis of the proteins and organisms in which they are found. RESULTS: Here we describe three protein domains of unknown function found mainly in bacteria: DUF3828, DUF3887 and DUF4878. Structures of representatives of each of these domains: BT_3511 from Bacteroides thetaiotaomicron (strain VPI-5482) [PDB:3KZT], Cj0202c from Campylobacter jejuni subsp. jejuni serotype O:2 (strain NCTC 11168) [PDB:3K7C], rumgna_01855) and RUMGNA_01855 from Ruminococcus gnavus (strain ATCC 29149) [PDB:4HYZ] have been solved by X-ray crystallography. All three domains are similar in structure and all belong to the NTF2-like superfamily. Although the function of these domains remains unknown at present, our analysis enables us to present a hypothesis concerning their role. CONCLUSIONS: Our analysis of these three protein domains suggests a potential non-catalytic ligand-binding role. This may regulate the activities of domains with which they are combined in the same polypeptide or via operonic linkages, such as signaling domains (e.g. serine/threonine protein kinase), peptidoglycan-processing hydrolases (e.g. NlpC/P60 peptidases) or nucleic acid binding domains (e.g. Zn-ribbons).


Subject(s)
Bacterial Proteins/chemistry , Nucleocytoplasmic Transport Proteins/chemistry , Peptide Mapping/methods , Bacteroides/chemistry , Campylobacter jejuni/chemistry , Catalytic Domain , Crystallography, X-Ray , Ligands , Protein Folding , Protein Multimerization , Protein Structure, Tertiary , Ruminococcus/chemistry
20.
BMC Bioinformatics ; 14: 265, 2013 Sep 03.
Article in English | MEDLINE | ID: mdl-24004689

ABSTRACT

BACKGROUND: Every genome contains a large number of uncharacterized proteins that may encode entirely novel biological systems. Many of these uncharacterized proteins fall into related sequence families. By applying sequence and structural analysis we hope to provide insight into novel biology. RESULTS: We analyze a previously uncharacterized Pfam protein family called DUF4424 [Pfam:PF14415]. The recently solved three-dimensional structure of the protein lpg2210 from Legionella pneumophila provides the first structural information pertaining to this family. This protein additionally includes the first representative structure of another Pfam family called the YARHG domain [Pfam:PF13308]. The Pfam family DUF4424 adopts a 19-stranded beta-sandwich fold that shows similarity to the N-terminal domain of leukotriene A-4 hydrolase. The YARHG domain forms an all-helical domain at the C-terminus. Structure analysis allows us to recognize distant similarities between the DUF4424 domain and individual domains of M1 aminopeptidases and tricorn proteases, which form massive proteasome-like capsids in both archaea and bacteria. CONCLUSIONS: Based on our analyses we hypothesize that the DUF4424 domain may have a role in forming large, multi-component enzyme complexes. We suggest that the YARGH domain may play a role in binding a moiety in proximity with peptidoglycan, such as a hydrophobic outer membrane lipid or lipopolysaccharide.


Subject(s)
Bacterial Proteins/chemistry , Databases, Protein , Legionella pneumophila/chemistry , Amino Acid Sequence , Bacterial Proteins/genetics , Legionella pneumophila/genetics , Molecular Sequence Data , Protein Structure, Tertiary , Sequence Alignment , Sequence Analysis, Protein
SELECTION OF CITATIONS
SEARCH DETAIL
...